home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-05-03 | 8.7 KB | 177 lines | [TEXT/ERIC] |
- POWERPC 604: A LOOK UNDER THE HOOD. AND A LOOK AT THE FUTURE
-
- (April 29) Initial reaction to the PowerPC 604 has been extremely
- favourable and as always when a new chip is launched, the world made
- a bee-line for Michael Slater, editorial director of the Microprocessor
- Report, for his view. The Report's lead headline last week was "PPC
- 604 Powers Past Pentium - PowerPC Chip Will Open Performance Gap,
- Possibly Permanently". We guess he thinks the 604 is OK. Not only
- does the 604 leave announced Pentia in the dust, but IBM and Motorola
- believe that it will give the forthcoming P6 a run for its money.
- Additionally, there are plans to make the 604 and the 603 smaller by
- incorporating a fifth layer of metal, but more of that later.
-
- There are now nearly 400 engineers working at the Somerset design
- facility and when their latest meisterwerk was launched last week, it
- was accompanied by a wealth of technical detail which in some measure
- goes to explain how the chip achieves its performance boost. However
- there are a few intriguing holes, which as we went to press still
- had to be filled. For example, the processor apparently plays host to
- some new graphics operations, which were not included in the 601. One
- IBM source suggests that these won't be supported by any general
- purpose compiler, but will be used by some graphic-intensive libraries.
-
- In general, this is what we do know: the 604 has six functional
- units: a floating point unit; a branch processing unit; a
- load/store unit; two standard integer units and one multiple-cycle
- integer unit, used for the rarer division and multiplication
- requirements. The chip tries to dispatch up to four instructions on
- each clock cycle. It has a 64bit external data bus and a 32-bit address
- bus. In general, the capabilities are closer to the PowerPC 603 than
- the 601. The Load/Store unit debuted in the 603, for example and the
- 603 and 604 share a register renaming capability not present in the
- earlier chip.
-
- Where the 603 and 604 differ, however is in size; where the 603 is a
- petit, bijou, chip-ette, measuring 85mm square and comprising 1.6
- million transistors, the 604 weighs in at 196mm square with 3.6
- million. This is bigger than the 601 and substantially bigger than
- Intel's P54C Pentium. At 8-10Watts it is also pretty power-hungry. A
- photo of the chip's layout reveals that one of the biggest items is the
- dispatch and completion unit which has to work out how to issue four
- instructions simultaneously and cope with out-of-order execution.
-
- Following the path of an instruction as it wends its way through the
- processor gives a good feel for how the various parts interact:
-
- We start at the 604's Fetch Unit. Its task is to grab instructions
- from the 16k on-board instruction cache and dump them into the eight-
- entry instruction queue. Under the simplest circumstances the
- addresses would simply be fetch sequentially, but the branch
- prediction unit will often kick in and offer its own suggestion of
- where to go next.
-
- The 604 is the first PowerPC processor to incorporate dynamic branch
- prediction - its predictions adapt as time goes on and the unit
- records which jumps were taken previously. When the 604 finds a
- branch, it predicts the outcome and executes the resultant code,
- storing the results in a parallel set of "rename registers" until it
- is certain whether the prediction was correct. The chip can go two-
- deep in its predictions. Dynamic branch prediction is based around
- two structures: the Branch History Table (BHT) and the branch target
- address cache (BTAC). The BTAC holds the target addresses for 64
- branches that have been taken in the past. The BHT, by comparison is
- used to predict conditional branches - the 512 entries are each
- assigned a two-bit value indicating four levels of dynamic prediction
- - strongly not taken; not taken; taken and strongly-taken. Each time
- the branch is taken, the value is incremented. Each time it is not-
- taken, it is reduced.
-
- The despatch unit is also responsible for allocating the decoded
- instruction to the appropriate execution unit and allocates a place
- in the completion unit's reorder buffer, while checking for
- dependencies between instructions in the dispatch queue.
-
- We haven't room to go through a full description of all the
- functional units here, but both the integer and floating point units
- have had two-entry reservation stations added to their front end
- which store dispatched instructions that cannot be executed until all
- the source operands are supplied. These reduced stalls in the chip.
- The floating point unit is the first single-pass double precision
- unit to be incorporated into the PowerPC line. This means that both
- single and double precision operations take place in one clock cycle
- with a latency of three cycles.
-
- Since instructions can finish out of order, the completion unit has
- to store executed instructions in the reorder buffer until all
- instructions ahead of it have been completed. Once everything is in
- order, the unit writes the instruction's results to the appropriate
- register file and updates any other resources affected. Several
- instructions may complete simultaneously.
-
- THE FUTURE
- As we said previously, the 604 is built using the older 0.5micron
- technology and it doesn't incorporate the new transistor design that
- appeared in the 100MHz PowerPC 601; apparently Motorola doesn't have
- fabs which can cope with this new process yet. However this is one
- way in which the 604 could get smaller and faster in the future.
- Certainly the 100MHz clock speed is seen as being in the middle of
- the 604's range, so we should expect both slower and faster parts.
-
- In addition, Ian Ferguson, Motorola Ltd's RISC product marketing
- engineer, suggests that the companies are looking to produce future
- versions of the 604 and 603 with an additional, fifth layer of metal
- for chip interconnects. This will make the chip between 5% and 10%
- smaller, he says. The shorter interconnects also have an impact on
- latency and therefore achievable clock speed.
-
- In the next issue, we will be examining how critical compiler
- technology is to getting the best out of the PowerPC and the extent
- to which optimising code for one member of the PowerPC family will
- cause a performance hit when run on another.
-
- Finally, for those who missed our original Powerflash with its
- performance comparison table, here it is again:
-
-
- =====================================================================
- POWERFLASH - POWERPC 604 ANNOUNCED
-
- (April 18) As promised Motorola and IBM unveiled the PowerPC 604
- processor today. The 32bit processor has one floating point unit, but
- three integer units - two for single clock cycle instructions, the
- other for integer multiplication and division. The chip has estimated
- SPECint92 and Specfp92 ratings of 160 and 165 respectively.
-
- The MPC604 is already sampling in small quantities to highly favoured
- customers, but general sampling is set to begin in the third quarter
- with volume production set for Q4. IBM will be manufacture the
- processor at its Burlington, Vermont facility and Motorola at its
- MOS-11 factory in Austin, Texas. No prices were given.
-
- The processor is being manufactured in 0.5 micron CMOS, but it is
- worth noting that the part does not use the new, smaller transistor
- geometry that made its debut in the 100MHz MPC601. So expect smaller
- 604s in the future.
-
- Aimed at the high-end desktop and server market, the new chip
- consumes between 8 Watts and 10 Watts in normal use. A 'nap' mode
- takes consumption down to around 400mW.
-
- Other specifications:
-
- *No. of transistors: 3.6 million
- *Die Size: 12.4mm x 15.8mm
- *Two separate 16k 4-way set associative instruction and data
- caches.
- *Dynamic Branch prediction with 64-entry fully associative branch
- target address cache and 512 entry branch history table.
- *Dispatch Unit has an 8-instruction buffer
-
- The 604 has an onboard phase-locked loop (PLL) which allows the
- processor to be drive at 1x, 1.5x, 2x or 3x the bus speed. The
- estimated benchmark below was for a 100MHz processor being driven at
- 1.5x the 66MHz bus-speed.
-
- The new performance comparison table looks like this:
-
- +-----------------+------------+------------+
- | Processor | SPECint92 | SPECfp92 |
- |=================|============|============|
- |MPC 601 50MHz | 51 | 6 |
- | 66 | 62 | 80 |
- | 80 | 80 | 105 |
- | 100 | 110 | 130 |
- |-----------------|------------|------------|
- |MPC 604 100MHz | 160 | 165 |
- |-----------------|------------|------------|
- |MPC 603 66 MHz | 60 | 70 |
- | 80 | 75 | 85 |
- |-----------------|------------|------------|
- |Pentium 90 MHz | 90 | 72.7 |
- | 100 | 100 | 80.6 |
- +-----------------+------------+------------+
-
- (c)PowerPC News - Free by mailing: add@power.globalnews.com
-
-